Skip to content

Conversation

ricardoV94
Copy link
Member

@ricardoV94 ricardoV94 commented Mar 31, 2025

This was a failed attempt to optimize an inplace join of multiple Elemwise, that pre-allocates an output and passes views to the elemwise inputs to store them inplace.

I could only see meaningful speedups with insanely large inputs.

import pytensor
import pytensor.tensor as pt
from pytensor.compile.mode import get_mode
import numpy as np

x = pt.vector("x", shape=(1000,))
out = pt.join(0, pt.abs(x), -pt.abs(x), pt.exp(x), pt.cos(x), x + 1, x  * 2)
out.dprint()
# To compare with previous output do `excluding("fusion", "inplace_join")
fn = pytensor.function([x], out, trust_input=True, mode=get_mode(None).excluding("fusion", "inplace_join"))
fn.dprint(print_memory_map=True)

x_test = np.random.normal(size=x.type.shape)
fn(x_test)
%timeit fn(x_test)
Join [id A]
 ├─ 0 [id B]
 ├─ Abs [id C]
 │  └─ x [id D]
 ├─ Neg [id E]
 │  └─ Abs [id F]
 │     └─ x [id D]
 ├─ Exp [id G]
 │  └─ x [id D]
 ├─ Cos [id H]
 │  └─ x [id D]
 ├─ Add [id I]
 │  ├─ x [id D]
 │  └─ ExpandDims{axis=0} [id J]
 │     └─ 1 [id K]
 └─ Mul [id L]
    ├─ x [id D]
    └─ ExpandDims{axis=0} [id M]
       └─ 2 [id N]
BufferJoin [id A] d={0: [0]} 14
 ├─ AllocEmpty{dtype='float64'} [id B] 0
 │  └─ 6000 [id C]
 ├─ Composite{abs(i0)} [id D] d={0: [1]} 8
 │  ├─ x [id E]
 │  └─ BufferSplit{:stop} [id F] 2
 │     ├─ AllocEmpty{dtype='float64'} [id B] 0
 │     │  └─ ···
 │     └─ 1000 [id G]
 ├─ Composite{(-i0)} [id H] d={0: [1]} 9
 │  ├─ Abs [id I] 1
 │  │  └─ x [id E]
 │  └─ BufferSplit{start:stop} [id J] 3
 │     ├─ AllocEmpty{dtype='float64'} [id B] 0
 │     │  └─ ···
 │     ├─ 1000 [id G]
 │     └─ 2000 [id K]
 ├─ Composite{exp(i0)} [id L] d={0: [1]} 10
 │  ├─ x [id E]
 │  └─ BufferSplit{start:stop} [id M] 4
 │     ├─ AllocEmpty{dtype='float64'} [id B] 0
 │     │  └─ ···
 │     ├─ 2000 [id K]
 │     └─ 3000 [id N]
 ├─ Composite{cos(i0)} [id O] d={0: [1]} 11
 │  ├─ x [id E]
 │  └─ BufferSplit{start:stop} [id P] 5
 │     ├─ AllocEmpty{dtype='float64'} [id B] 0
 │     │  └─ ···
 │     ├─ 3000 [id N]
 │     └─ 4000 [id Q]
 ├─ Composite{(i0 + i1)} [id R] d={0: [2]} 12
 │  ├─ [1.] [id S]
 │  ├─ x [id E]
 │  └─ BufferSplit{start:stop} [id T] 6
 │     ├─ AllocEmpty{dtype='float64'} [id B] 0
 │     │  └─ ···
 │     ├─ 4000 [id Q]
 │     └─ 5000 [id U]
 └─ Composite{(i0 * i1)} [id V] d={0: [2]} 13
    ├─ [2.] [id W]
    ├─ x [id E]
    └─ BufferSplit{start:} [id X] 7
       ├─ AllocEmpty{dtype='float64'} [id B] 0
       │  └─ ···
       └─ 5000 [id U]

Perhaps if I fused the buffer_splits with the buffer_alloc it would be faster, but that was more work than I was willing to take.

Copy link

codecov bot commented Mar 31, 2025

Codecov Report

Attention: Patch coverage is 58.13953% with 36 lines in your changes missing coverage. Please review.

Project coverage is 81.96%. Comparing base (0b56ed9) to head (35862ef).
Report is 148 commits behind head on main.

Files with missing lines Patch % Lines
pytensor/tensor/rewriting/basic.py 59.49% 28 Missing and 4 partials ⚠️
pytensor/link/numba/dispatch/subtensor.py 42.85% 4 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1333      +/-   ##
==========================================
- Coverage   82.01%   81.96%   -0.05%     
==========================================
  Files         203      203              
  Lines       48805    48889      +84     
  Branches     8688     8698      +10     
==========================================
+ Hits        40026    40074      +48     
- Misses       6627     6659      +32     
- Partials     2152     2156       +4     
Files with missing lines Coverage Δ
pytensor/link/numba/dispatch/subtensor.py 93.61% <42.85%> (-1.97%) ⬇️
pytensor/tensor/rewriting/basic.py 90.51% <59.49%> (-4.07%) ⬇️

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant